Biological Pattern Discovery with R Machine Learning Approaches (Zheng Rong Yang)

a data set into sub-data sets, each of which is a cluster. When it

ed to investigate how genes function within a genome under a

uster analysis can be used to partition genes into groups according

measured responsive strengths so as to discover how the genes

in different ways. Because of this reason, cluster analysis has

y popular in many biological/medical pattern discovery tasks [Le

al., 2003; Liu, et al., 2005; Huang and Pan, 2006; Pan, 2006;

and Saito, 2013]. Cluster analysis has been used to examine how

endent conformation changes [Chaturvedi, et al., 2020]. In a lung

udy, the use of cluster analysis has discovered that malignant

ioma tumours demonstrate high ROR1 expression [Miyake, et al.,

o examine whether bovine granulocytes is a symptom of

n of liver functions, cluster analysis has been used to confirm the

hip of gene expression between bovine granulocytes and

n of liver functions [Kizaki, et al., 2020]. The molecules which

milar biological functions in similar biological pathways are

to show similar activities in an experiment. A cluster analysis

s revealed different numbers of differentially expressed genes

ng to the same drought stress in wheat, maize and rice [Wei and

18].

of these researches are hypothesis-driven and have little or no a

owledge about how a cluster model should look like in advance.

words, no knowledge will tell which group of molecules should

ate a similar response to a stress before a pattern analysis process

machine learning algorithm starts. Cluster analysis is thus a good

to be used for this kind of pattern analysis.

uster data, the first issue is how to measure the distances between

ts. Different clustering algorithms may use different metrics for

g the distances or the similarities between data points for

types of data. Among them, the Euclidean distance, the Hamming